Step 1 - Installing plotly module¶

In [1]:
! pip install plotly
Requirement already satisfied: plotly in c:\users\prave\anaconda3\lib\site-packages (5.9.0)
Requirement already satisfied: tenacity>=6.2.0 in c:\users\prave\anaconda3\lib\site-packages (from plotly) (8.0.1)
WARNING: Ignoring invalid distribution -ygments (c:\users\prave\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -ygments (c:\users\prave\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -ygments (c:\users\prave\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -ygments (c:\users\prave\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -ygments (c:\users\prave\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -ygments (c:\users\prave\anaconda3\lib\site-packages)

Importing the Required Libraries¶

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
import re

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

Reading The Dataset¶

In [3]:
# loading the data set

df=pd.read_csv(r"C:\Users\prave\Downloads\dataset.csv")

Dataset Description

VIN (1-10)-Vehicle Identification Number of the vehicle mentioned in the dataset.

County-Name of the County from where the data is gathered.

City-Name of the Cities from where the data is gathered.

State-Name of the State from where the data is gathered.

Postal Code-The postal code from where the data is gathered.

Model Year-Manufacturing year of the model mentioned in the data set.

Make-Manufacturer of the vehicle.

Model-Model Name of the mentioned vehicle.

Electric Vehicle Type-Type of the vehicle present in the dataset.

Clean Alternative Fuel Vehicle (CAFV) Eligibility-Clean Alternative for the data present in this dataset.

Vehicle Location-logitude and latitude

Electric Range-range of kms travelled

In [4]:
# Sample data to understand the Data

df.head()
Out[4]:
VIN (1-10) County City State Postal Code Model Year Make Model Electric Vehicle Type Clean Alternative Fuel Vehicle (CAFV) Eligibility Electric Range Base MSRP Legislative District DOL Vehicle ID Vehicle Location Electric Utility 2020 Census Tract
0 JTMEB3FV6N Monroe Key West FL 33040 2022 TOYOTA RAV4 PRIME Plug-in Hybrid Electric Vehicle (PHEV) Clean Alternative Fuel Vehicle Eligible 42 0 NaN 198968248 POINT (-81.80023 24.5545) NaN 12087972100
1 1G1RD6E45D Clark Laughlin NV 89029 2013 CHEVROLET VOLT Plug-in Hybrid Electric Vehicle (PHEV) Clean Alternative Fuel Vehicle Eligible 38 0 NaN 5204412 POINT (-114.57245 35.16815) NaN 32003005702
2 JN1AZ0CP8B Yakima Yakima WA 98901 2011 NISSAN LEAF Battery Electric Vehicle (BEV) Clean Alternative Fuel Vehicle Eligible 73 0 15.0 218972519 POINT (-120.50721 46.60448) PACIFICORP 53077001602
3 1G1FW6S08H Skagit Concrete WA 98237 2017 CHEVROLET BOLT EV Battery Electric Vehicle (BEV) Clean Alternative Fuel Vehicle Eligible 238 0 39.0 186750406 POINT (-121.7515 48.53892) PUGET SOUND ENERGY INC 53057951101
4 3FA6P0SU1K Snohomish Everett WA 98201 2019 FORD FUSION Plug-in Hybrid Electric Vehicle (PHEV) Not eligible due to low battery range 26 0 38.0 2006714 POINT (-122.20596 47.97659) PUGET SOUND ENERGY INC 53061041500
In [5]:
# columns in dataframe

df.columns
Out[5]:
Index(['VIN (1-10)', 'County', 'City', 'State', 'Postal Code', 'Model Year',
       'Make', 'Model', 'Electric Vehicle Type',
       'Clean Alternative Fuel Vehicle (CAFV) Eligibility', 'Electric Range',
       'Base MSRP', 'Legislative District', 'DOL Vehicle ID',
       'Vehicle Location', 'Electric Utility', '2020 Census Tract'],
      dtype='object')
In [6]:
# Info of the data

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 112634 entries, 0 to 112633
Data columns (total 17 columns):
 #   Column                                             Non-Null Count   Dtype  
---  ------                                             --------------   -----  
 0   VIN (1-10)                                         112634 non-null  object 
 1   County                                             112634 non-null  object 
 2   City                                               112634 non-null  object 
 3   State                                              112634 non-null  object 
 4   Postal Code                                        112634 non-null  int64  
 5   Model Year                                         112634 non-null  int64  
 6   Make                                               112634 non-null  object 
 7   Model                                              112614 non-null  object 
 8   Electric Vehicle Type                              112634 non-null  object 
 9   Clean Alternative Fuel Vehicle (CAFV) Eligibility  112634 non-null  object 
 10  Electric Range                                     112634 non-null  int64  
 11  Base MSRP                                          112634 non-null  int64  
 12  Legislative District                               112348 non-null  float64
 13  DOL Vehicle ID                                     112634 non-null  int64  
 14  Vehicle Location                                   112610 non-null  object 
 15  Electric Utility                                   112191 non-null  object 
 16  2020 Census Tract                                  112634 non-null  int64  
dtypes: float64(1), int64(6), object(10)
memory usage: 14.6+ MB
In [7]:
# checking null values in data

df.isnull().sum()
Out[7]:
VIN (1-10)                                             0
County                                                 0
City                                                   0
State                                                  0
Postal Code                                            0
Model Year                                             0
Make                                                   0
Model                                                 20
Electric Vehicle Type                                  0
Clean Alternative Fuel Vehicle (CAFV) Eligibility      0
Electric Range                                         0
Base MSRP                                              0
Legislative District                                 286
DOL Vehicle ID                                         0
Vehicle Location                                      24
Electric Utility                                     443
2020 Census Tract                                      0
dtype: int64

Checking the duplicate values¶

In [8]:
df.duplicated().sum()
Out[8]:
0
In [9]:
df
Out[9]:
VIN (1-10) County City State Postal Code Model Year Make Model Electric Vehicle Type Clean Alternative Fuel Vehicle (CAFV) Eligibility Electric Range Base MSRP Legislative District DOL Vehicle ID Vehicle Location Electric Utility 2020 Census Tract
0 JTMEB3FV6N Monroe Key West FL 33040 2022 TOYOTA RAV4 PRIME Plug-in Hybrid Electric Vehicle (PHEV) Clean Alternative Fuel Vehicle Eligible 42 0 NaN 198968248 POINT (-81.80023 24.5545) NaN 12087972100
1 1G1RD6E45D Clark Laughlin NV 89029 2013 CHEVROLET VOLT Plug-in Hybrid Electric Vehicle (PHEV) Clean Alternative Fuel Vehicle Eligible 38 0 NaN 5204412 POINT (-114.57245 35.16815) NaN 32003005702
2 JN1AZ0CP8B Yakima Yakima WA 98901 2011 NISSAN LEAF Battery Electric Vehicle (BEV) Clean Alternative Fuel Vehicle Eligible 73 0 15.0 218972519 POINT (-120.50721 46.60448) PACIFICORP 53077001602
3 1G1FW6S08H Skagit Concrete WA 98237 2017 CHEVROLET BOLT EV Battery Electric Vehicle (BEV) Clean Alternative Fuel Vehicle Eligible 238 0 39.0 186750406 POINT (-121.7515 48.53892) PUGET SOUND ENERGY INC 53057951101
4 3FA6P0SU1K Snohomish Everett WA 98201 2019 FORD FUSION Plug-in Hybrid Electric Vehicle (PHEV) Not eligible due to low battery range 26 0 38.0 2006714 POINT (-122.20596 47.97659) PUGET SOUND ENERGY INC 53061041500
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
112629 7SAYGDEF2N King Duvall WA 98019 2022 TESLA MODEL Y Battery Electric Vehicle (BEV) Eligibility unknown as battery range has not b... 0 0 45.0 217955265 POINT (-121.98609 47.74068) PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA) 53033032401
112630 1N4BZ1CP7K San Juan Friday Harbor WA 98250 2019 NISSAN LEAF Battery Electric Vehicle (BEV) Clean Alternative Fuel Vehicle Eligible 150 0 40.0 103663227 POINT (-123.01648 48.53448) BONNEVILLE POWER ADMINISTRATION||ORCAS POWER &... 53055960301
112631 1FMCU0KZ4N King Vashon WA 98070 2022 FORD ESCAPE Plug-in Hybrid Electric Vehicle (PHEV) Clean Alternative Fuel Vehicle Eligible 38 0 34.0 193878387 POINT (-122.4573 47.44929) PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA) 53033027702
112632 KNDCD3LD4J King Covington WA 98042 2018 KIA NIRO Plug-in Hybrid Electric Vehicle (PHEV) Not eligible due to low battery range 26 0 47.0 125039043 POINT (-122.09124 47.33778) PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA) 53033032007
112633 YV4BR0CL8N King Covington WA 98042 2022 VOLVO XC90 Plug-in Hybrid Electric Vehicle (PHEV) Not eligible due to low battery range 18 0 47.0 194673692 POINT (-122.09124 47.33778) PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA) 53033032005

112634 rows × 17 columns

In [10]:
df['Electric Utility'] = df['Electric Utility'].fillna('Utility Not Avalilable')
df['Legislative District'] = df['Legislative District'].fillna('Unknown')
df['Vehicle Location'] = df['Vehicle Location'].fillna('Unknown')
df['Model'] = df['Model'].fillna('Unknown')
df['2020 Census Tract'] = df['2020 Census Tract'].fillna('Unknown')
df['City'] = df['City'].fillna('Unknown')
In [11]:
df['Postal Code'] = df['Postal Code'].astype(int)
In [12]:
df.shape
Out[12]:
(112634, 17)
In [13]:
df.isna().sum()
Out[13]:
VIN (1-10)                                           0
County                                               0
City                                                 0
State                                                0
Postal Code                                          0
Model Year                                           0
Make                                                 0
Model                                                0
Electric Vehicle Type                                0
Clean Alternative Fuel Vehicle (CAFV) Eligibility    0
Electric Range                                       0
Base MSRP                                            0
Legislative District                                 0
DOL Vehicle ID                                       0
Vehicle Location                                     0
Electric Utility                                     0
2020 Census Tract                                    0
dtype: int64
In [14]:
# Checking the shape 

df.shape
Out[14]:
(112634, 17)
In [15]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 112634 entries, 0 to 112633
Data columns (total 17 columns):
 #   Column                                             Non-Null Count   Dtype 
---  ------                                             --------------   ----- 
 0   VIN (1-10)                                         112634 non-null  object
 1   County                                             112634 non-null  object
 2   City                                               112634 non-null  object
 3   State                                              112634 non-null  object
 4   Postal Code                                        112634 non-null  int32 
 5   Model Year                                         112634 non-null  int64 
 6   Make                                               112634 non-null  object
 7   Model                                              112634 non-null  object
 8   Electric Vehicle Type                              112634 non-null  object
 9   Clean Alternative Fuel Vehicle (CAFV) Eligibility  112634 non-null  object
 10  Electric Range                                     112634 non-null  int64 
 11  Base MSRP                                          112634 non-null  int64 
 12  Legislative District                               112634 non-null  object
 13  DOL Vehicle ID                                     112634 non-null  int64 
 14  Vehicle Location                                   112634 non-null  object
 15  Electric Utility                                   112634 non-null  object
 16  2020 Census Tract                                  112634 non-null  int64 
dtypes: int32(1), int64(5), object(11)
memory usage: 14.2+ MB

Droping The Unwanted Columns¶

In [16]:
df.drop(['Postal Code','Base MSRP','Legislative District','DOL Vehicle ID','Electric Utility','2020 Census Tract'],axis=1,inplace=True)

Task - 1

which company makes most of the electric vehicles¶

In [17]:
company_counts = df.groupby('Make').count().sort_values(by='City', ascending=False)['City'].reset_index()
top_10 = company_counts[:10]

# Create the bar chart
fig = px.bar(top_10, x='Make', y='City', labels={'Make': 'Companies', 'City': 'Count'},
             title='Top 10 Electric Vehicle Companies by Number of Cities', color='City',
             color_continuous_scale='Viridis')

# Show the plot
fig.show()

This plot shows the top 10 electric vehicle companies by number of cities in which they have registered vehicles.

Tesla is the clear leader in the electric vehicle market, with registrations in most of the cities.

Nissan is the second-most popular electric vehicle company.

Chevroletis the third-most popular electric vehicle company.

Ford is the fourth-most popular electric vehicle company.

BMW is the fifth-most popular electric vehicle company.

What are the Top 10 counties with more electric vehicles¶

In [18]:
Counties = df.groupby('County').count().sort_values(by='City',ascending=False)['City'].index
values = df.groupby('County').count().sort_values(by='City',ascending=False)['City'].values

px.bar(x=list(Counties)[:10],y=values[:10],labels={'x':"County Name",'y':"Number of Cars"},color=values[:10])

The bar plot shows the top 10 counties with the most electric vehicles registered in the United States.¶

king County has the most electric vehicles, followed by snohomish County and pierece County.¶

Top 10 vechicle Models

In [19]:
Companies = df.groupby('Make').count().sort_values(by='City',ascending=False)['City'].index
values = df.groupby('Make').count().sort_values(by='City',ascending=False)['City'].values
top_n = 10  
top_companies = company_counts[:top_n].reset_index() 
fig = px.bar(top_companies, x='Make', y='City', labels={'Make': 'Companies', 'City': 'Count'},
             title='Top Companies Producing Electric Vehicles', color='City',
             color_continuous_scale='Viridis') 
fig.update_layout(xaxis_tickangle=-45) 

fig.show() 
In [20]:
px.pie(names=list(Companies)[:10],values=values[:10],width=500,height=400)

This plot shows the top 10 electric vehicle models by number of cities in which they are registered. The insights that we can draw from this chart are:

The Tesla Model 3 is the most popular electric vehicle model, with registrations in over most of cities.

The Tesla Model Y is the second most popular electric vehicle model.

The Nissan LEAF is the third most popular electric vehicle model.

The Tesla Model S and Model X are also popular models.

what are the most sold models per each company

In [21]:
# Get the top 10 models by number of cities
model_counts = df.groupby('Model').count().sort_values(by='City', ascending=False)['City'].reset_index()
top_10 = model_counts[:10]

# Create the bar chart
fig = px.bar(top_10, x='Model', y='City', labels={'Model': 'Models', 'City': 'Count'},
             title='Top 10 Electric Vehicle Models by Number of Cities', color='City',
             color_continuous_scale='Viridis')

# Show the plot
fig.show()
In [22]:
top_10_companies = list(Companies)[:10]
for i in top_10_companies:
    data = df[df['Make']==i]
    data = data.groupby('Model').count().sort_values(by='City',ascending=False).index
    print('Top selling model for',i,'is ----------->',data[0])
Top selling model for TESLA is -----------> MODEL 3
Top selling model for NISSAN is -----------> LEAF
Top selling model for CHEVROLET is -----------> BOLT EV
Top selling model for FORD is -----------> FUSION
Top selling model for BMW is -----------> I3
Top selling model for KIA is -----------> NIRO
Top selling model for TOYOTA is -----------> PRIUS PRIME
Top selling model for VOLKSWAGEN is -----------> ID.4
Top selling model for AUDI is -----------> E-TRON
Top selling model for VOLVO is -----------> XC90
In [23]:
#Percentage of BEV vs PHEV
Vehicle_type = list(df.groupby('Electric Vehicle Type').count()['County'].index)
values = df.groupby('Electric Vehicle Type').count()['County'].values

px.pie(names=Vehicle_type,values=values,height=400)

Majority of the vehicles are Battery Electric Vehicles(BEV)

In [24]:
# whats the percentage of top 10 companies vehicles are BEV and PHEV

for index,i in enumerate(top_10_companies):
    data = df[df['Make']==i]
    labels = list(data.groupby('Electric Vehicle Type').count()['City'].index)
    values = list(data.groupby('Electric Vehicle Type').count()['City'].values)
    fig = px.pie(names=labels,values=values,width=700,height=400,title=str(i))
    fig.show()

This plot shows the distribution of electric vehicle types for each of the top 10 companies:

Tesla ,Nissan,volkswagen are producing majority battery electric vehicles (BEVs)

other companies are producing plug-in hybrid electric vehicles (PHEVs).

In [25]:
year_wise_cars = df.groupby('Model Year')['VIN (1-10)'].count().reset_index()
year_wise_cars.columns = ['year','num_cars']
fig = px.line(year_wise_cars,x="year", y="num_cars", title='Year Wise Number of Cars',markers=True)
fig.show()
In [26]:
year_wise_cars.sort_values(by='num_cars', ascending=False).head(10)
Out[26]:
year num_cars
18 2022 26530
17 2021 18364
14 2018 14246
16 2020 11038
15 2019 10266
13 2017 8644
12 2016 5735
11 2015 4940
9 2013 4691
10 2014 3685

The line chart you provided shows the number of electric vehicles registered in the United States by year. The insights that we can draw from this chart are:

The number of electric vehicles registered in the United States has been increasing steadily in recent years.

The number of electric vehicles registered in the United States has continued to grow in recent years.

The chart also shows that there is a wide range of years with different numbers of electric vehicles registered.

In [27]:
car_counts_St = df['State'].value_counts().nlargest(10)

fig = px.bar(car_counts_St, x=car_counts_St.index, y=car_counts_St.values,
             labels={'x': 'State', 'y': 'Number of Cars (log scale)'},
             title='Top 10 Count of Cars per State',
             template='plotly_dark')

fig.update_layout(yaxis_type='log')

fig.update_traces(marker_color='steelblue')


fig.show()
car_counts_St_df = car_counts_St.to_frame()
car_counts_St_df.style.background_gradient(cmap='Blues')
Out[27]:
  State
WA 112348
CA 76
VA 36
MD 26
TX 14
CO 9
NV 8
GA 7
NC 7
CT 6

WA state has more number of cars

In [28]:
cnt_MkCity = df.groupby(['City', 'Make']).size().reset_index(name='Count')

#  Group the data by city and make, and sum the counts for each group
grouped_data_cty = cnt_MkCity.groupby(['City', 'Make'])['Count'].sum().reset_index()

#  Group the data by city and sum the counts for each city and make
city_counts = grouped_data_cty.groupby('City')['Count'].sum().reset_index()
make_counts = grouped_data_cty.groupby('Make')['Count'].sum().reset_index()

#  Sort the cities by count in descending order, and select the top 10
top_cities = city_counts.sort_values(by='Count', ascending=False).head(10)
top_makes = make_counts.sort_values(by='Count', ascending=False).head(10)

#  Filter the data to only include the top 10 cities and top 10 makes
filtered_data_Cty = grouped_data_cty[
    grouped_data_cty['City'].isin(top_cities['City']) & grouped_data_cty['Make'].isin(top_makes['Make'])
]


pivoted_data_cty = filtered_data_Cty.pivot(index='City', columns='Make', values='Count').fillna(0)

fig = go.Figure()

for make in top_makes['Make']:
    fig.add_trace(go.Bar(name=make, x=pivoted_data_cty.index, y=pivoted_data_cty[make]))

fig.update_layout(title='Top 10 Make distribution count per top 10 City',
                  xaxis_title='City',
                  yaxis_title='Number of Cars')


fig.show()

pivoted_data_cty.head()
Out[28]:
Make AUDI BMW CHEVROLET FORD KIA NISSAN TESLA TOYOTA VOLKSWAGEN VOLVO
City
Bellevue 120 295 211 131 131 527 3714 140 76 103
Bothell 48 119 183 128 98 374 1950 67 48 57
Kirkland 97 174 173 92 117 316 2112 70 70 91
Olympia 48 70 456 215 177 360 805 170 62 39
Redmond 70 168 189 110 112 460 2570 101 77 66

The stacked bar plot you provided shows the distribution of electric vehicles by make in the top 10 cities in the United States. The insights that we can draw from this plot are:

Tesla is the most popular make in the top 10 cities, followed by Nissan and Chevrolet.

The plot also shows that there is a wide range of makes represented in the top 10 cities. This suggests that there is a growing demand for electric vehicles from a variety of manufacturers.

In [29]:
import pandas as pd
import plotly.graph_objects as go

# Calculate the counts of cars for each state and make combination
cnt_Mk_St = df.groupby(['State', 'Make']).size().reset_index(name='Count')

#  Group the data by state and make, and sum the counts for each group
grouped_data_St = cnt_Mk_St.groupby(['State', 'Make'])['Count'].sum().reset_index()

#  Group the data by state and sum the counts for each state and make
st_counts = grouped_data_St.groupby('State')['Count'].sum().reset_index()
make_counts = grouped_data_St.groupby('Make')['Count'].sum().reset_index()

#  Sort the states by count in descending order, and select the top 10
top_States = st_counts.sort_values(by='Count', ascending=False).head(10)
top_makes = make_counts.sort_values(by='Count', ascending=False).head(10)

#  Filter the data to only include the top 10 states and top 10 makes
filtered_data_St = grouped_data_St[
    grouped_data_St['State'].isin(top_States['State']) & grouped_data_St['Make'].isin(top_makes['Make'])
]


pivoted_data_St = filtered_data_St.pivot(index='State', columns='Make', values='Count').fillna(0)

fig = go.Figure()

for make in top_makes['Make']:
    fig.add_trace(go.Bar(name=make, x=pivoted_data_St.index, y=pivoted_data_St[make]))


fig.update_layout(title='Top 10 Make distribution count per top 10 State',
                  xaxis_title='State',
                  yaxis_title='Number of Cars',
                  yaxis_type='log')  # Set y-axis to logarithmic scale

fig.show()


pivoted_data_St.head(10)
Out[29]:
Make AUDI BMW CHEVROLET FORD KIA NISSAN TESLA TOYOTA VOLKSWAGEN VOLVO
State
AZ 0.0 0.0 0.0 0.0 0.0 1.0 3.0 1.0 0.0 1.0
CA 2.0 1.0 3.0 7.0 1.0 2.0 40.0 6.0 2.0 4.0
CO 0.0 1.0 2.0 0.0 0.0 0.0 4.0 0.0 0.0 0.0
GA 0.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0
MD 1.0 1.0 2.0 2.0 0.0 1.0 10.0 2.0 0.0 1.0
NC 0.0 1.0 1.0 0.0 1.0 0.0 3.0 0.0 0.0 0.0
NV 0.0 0.0 2.0 1.0 0.0 0.0 5.0 0.0 0.0 0.0
TX 0.0 0.0 0.0 4.0 0.0 0.0 9.0 1.0 0.0 0.0
VA 0.0 3.0 4.0 2.0 1.0 2.0 17.0 3.0 1.0 0.0
WA 2327.0 4665.0 10162.0 5795.0 4476.0 12866.0 51944.0 4384.0 2509.0 2281.0

This plot shows make distribution of cars top 10 states¶

Tesla clearly Tops the list¶

In [30]:
fig = px.histogram(df, x='Electric Range', color='Electric Vehicle Type',
                   nbins=30, barmode='overlay', histfunc='count', 
                   labels={'Electric Range': 'Electric Range', 'Electric Vehicle Type': 'Vehicle Type'},
                   title='Electric Vehicle Range Distribution by Vehicle Type')

# Step 3: Show the plot
fig.show()

The histogram you provided shows the distribution of electric vehicle ranges by vehicle type. The insights that we can draw from this plot are:

There is a wide range of electric vehicle ranges, from around 100 miles to over 300 miles.

Battery electric vehicles (BEVs) have a wider range than plug-in hybrid electric vehicles (PHEVs).

The average range for BEVs is around 200 miles, while the average range for PHEVs is around 50 miles.

There are a few BEVs with ranges of over 300 miles, but most BEVs have ranges of less than 250 miles.

Task - 2

Number of models of company for each year for last 10 years¶

In [31]:
#Number of models of company for each year for last 10 years
data = df.copy()
data['top_10'] = data['Make'].apply(lambda x: 1 if x in top_10_companies else 0)
data = data[data['top_10'] == 1]
data = data[data['Model Year'] >= 2011]

#  Create the Count plot using Plotly
fig = px.histogram(data, x='Model Year', color='Make', barmode='group', labels={'Model Year': 'Model Year', 'Make': 'Manufacturer'},
                   title='Model Year Distribution for Top 10 Companies (Since 2011)',
                   template='ggplot2')

#  Show the plot
fig.show()

From 2018 Tesla vechiles are increasing rapidly .¶

earlier NISSAN and CHEVY used to have hold of the market then TESLA took over it¶

In [32]:
import re
Location_data = df.groupby('Vehicle Location').count()['County'].reset_index()
Location_data.rename(columns={'Vehicle Location': 'Locations', 'County': 'Count'}, inplace=True)

#  Extract latitude and longitude from 'Locations'
def extract_latitude(location):
    try:
        latitude = re.findall(r'[-+]?\d*\.\d+|\d+', location.split('(')[-1])
        return float(latitude[0])
    except:
        return None

def extract_longitude(location):
    try:
        longitude = re.findall(r'[-+]?\d*\.\d+|\d+', location.split('(')[-1])
        return float(longitude[1])
    except:
        return None

Location_data['Latitude'] = Location_data['Locations'].apply(extract_latitude)
Location_data['Longitude'] = Location_data['Locations'].apply(extract_longitude)

Location_data.dropna(subset=['Latitude', 'Longitude'], inplace=True)
In [33]:
fig = px.scatter(Location_data, x=Location_data['Latitude'],y= Location_data['Longitude'], size='Count', color='Count',
                 labels={'Latitude': 'Latitude', 'Longitude': 'Longitude', 'Count': 'Count'},
                 title='Vehicle Locations and Counts',
                 hover_data=['Locations', 'Count'])


fig.update_layout(xaxis_range=[-130, -60], yaxis_range=[20, 60])


fig.show()

The scatter plot provided shows the locations of electric vehicles in the United States, with the size of the points representing the number of vehicles at that location. The insights that we can draw from this plot are:

There are a number of clusters of electric vehicles, particularly in the northeastern United States, and the Pacific Northwest. There are fewer electric vehicles in the southern and central United States.

The size of the points shows that there is a wide variation in the number of electric vehicles at different locations.

The plot also shows that there is a positive correlation between the number of electric vehicles and population density, meaning that there are more electric vehicles in areas with more people.

In [34]:
df_copy = df.copy()

#  Extract latitude and longitude from 'Vehicle Location'
def extract_latitude(location):
    try:
        latitude = re.findall(r'[-+]?\d*\.\d+|\d+', location.split('(')[-1])
        return float(latitude[0])
    except:
        return None

def extract_longitude(location):
    try:
        longitude = re.findall(r'[-+]?\d*\.\d+|\d+', location.split('(')[-1])
        return float(longitude[1])
    except:
        return None

df_copy['Lattitude'] = df_copy['Vehicle Location'].apply(extract_latitude)
df_copy['Longitude'] = df_copy['Vehicle Location'].apply(extract_longitude)
df_copy.dropna(subset=['Lattitude', 'Longitude'], inplace=True)
In [35]:
fig = px.scatter(df_copy, x='Lattitude', y='Longitude', color='Clean Alternative Fuel Vehicle (CAFV) Eligibility',
                 labels={'Lattitude': 'Latitude', 'Longitude': 'Longitude',
                         'Clean Alternative Fuel Vehicle (CAFV) Eligibility': 'CAFV Eligibility'},
                 title='Scatter Plot of Latitude and Longitude')


fig.update_layout(xaxis_range=[-130, -60], yaxis_range=[20, 50])


fig.show()

The scatter plot you provided shows the distribution of electric vehicles by latitude and longitude. The insights that we can draw from this plot are:

There is a clear cluster of electric vehicles in the northwestren United States.

There are also clusters of electric vehicles in the northeastern United States.

There are fewer electric vehicles in the southern and central United States.

There is a positive correlation between latitude and CAFV eligibility, meaning that there are more electric vehicles eligible for CAFV rebates in the northern states.

In [36]:
fig = px.scatter(df_copy, x='Lattitude', y='Longitude', color='Electric Vehicle Type',
                 labels={'Lattitude': 'Latitude', 'Longitude': 'Longitude',
                         'Electric Vehicle Type': 'Electric Vehicle Type'},
                 title='Scatter Plot of Latitude and Longitude')

# Step 5: Set the plot limits for Latitude and Longitude
fig.update_layout(xaxis_range=[-130, -60], yaxis_range=[20, 50])

# Step 6: Show the plot
fig.show()

The scatter plot you provided shows the distribution of electric vehicles by latitude and longitude, with the points colored by electric vehicle type. The insights that we can draw from this plot are:¶

There is a clear cluster of battery electric vehicles (BEVs) in the western United States.¶

There are also clusters of BEVs in the northwestern United States.¶

There are fewer BEVs in the southern and central United States.¶

Plug-in hybrid electric vehicles (PHEVs) are more evenly distributed across the United States.¶

There is a positive correlation between latitude and BEV ownership, meaning that there are more BEVs in the northern states.¶

In [37]:
states = list(df.groupby('State').count().sort_values(by='City',ascending=False)['City'].index)
values = df.groupby('State').count().sort_values(by='City',ascending=False)['City'].values
In [38]:
data = pd.DataFrame(df.groupby('State').count().sort_values(by='City',ascending=False)['City'])
In [39]:
data = data.reset_index()
In [40]:
data.columns = ['State','Count']

Choropleth to display the number of EV vehicles based on location.¶

In [41]:
fig = px.choropleth(data,
                    locations='State',
                    locationmode="USA-states",
                    color='Count',
                    color_continuous_scale="blues",
                    scope="usa")

fig.show()

Task - 3

Create a Racing Bar Plot to display the animation of EV Make and its count each year.¶

In [42]:
# Group the data by 'Model Year' and 'Make', and calculate the count for each group
ev_make_count_by_year = df.groupby(['Model Year', 'Make']).size().reset_index(name='Count')

# Ensure all combinations of 'Model Year' and 'Make' with zero counts are included
all_model_years = df['Model Year'].unique()
all_makes = df['Make'].unique()
all_combinations = pd.MultiIndex.from_product([all_model_years, all_makes], names=['Model Year', 'Make'])
all_combinations_df = pd.DataFrame(index=all_combinations).reset_index()

ev_make_count_by_year = pd.merge(all_combinations_df, ev_make_count_by_year, on=['Model Year', 'Make'], how='left')
ev_make_count_by_year['Count'].fillna(0, inplace=True)

# Create the Racing Bar Plot using Plotly
fig = px.bar(ev_make_count_by_year,
             x='Count',
             y='Make',
             animation_frame='Model Year',
             color='Make',
             labels={'Make': 'EV Make', 'Count': 'Count'},
             title='EV Maker and Count Each Year'
            )

# Customize the layout
fig.update_layout(
    xaxis_title='Count',
    yaxis_title='EV Make',
    yaxis={'categoryorder': 'total ascending'}  
)

fig.show()

Conclusion

The electric vehicle market is growing rapidly.

The electric vehicle market is likely to continue to grow in the coming years, as the demand for electric vehicles increases.

TESLA is a Leading in Electric Vechile Manufacturer .

Majority of the vehicles are Battery Electric Vehicles(BEV) and Tesla is producing Battery Electric Vehicles(BEV)

There is a positive correlation between the number of electric vehicles and population density, meaning that there are more electric vehicles in areas with more people.

There are more BEVs in the northwestren states.

In [ ]: